Flexible region based sample adaptive offset (sao) and adaptive loop filter (alf)

ABSTRACT

A method for in-loop filtering in a video encoder is provided that includes determining filter parameters for each filtering region of a plurality of filtering regions of a reconstructed picture, applying in-loop filtering to each filtering region according to the filter parameters determined for the filtering region, and signaling the filter parameters for each filtering region in an encoded video bit stream, wherein the filter parameters for each filtering region are signaled after encoded data of a final largest coding unit (LCU) in the filtering region, wherein the in-loop filtering is selected from a group consisting of adaptive loop filtering and sample adaptive offset filtering.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/156,097, filed May 16, 2016, which is a continuation of U.S. patentapplication Ser. No. 13/594,701 filed Aug. 24, 2012 (now U.S. Pat. No.9,344,743), which claims the benefit of U.S. Provisional PatentApplication Ser. No. 61/526,975, filed Aug. 24, 2011, all of which areincorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION Field of the Invention

Embodiments of the present invention generally relate to flexible regionbased sample adaptive offset (SAO) and adaptive loop filter (ALF) invideo coding.

Description of the Related Art

The Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T WP3/16and ISO/IEC JTC 1/SC 29/WG 11 is currently developing thenext-generation video coding standard referred to as High EfficiencyVideo Coding (HEVC). Similar to previous video coding standards such asH.264/AVC, HEVC is based on a hybrid coding scheme using block-basedprediction and transform coding. First, the input signal is split intorectangular blocks that are predicted from the previously decoded databy either motion compensated (inter) prediction or intra prediction. Theresulting prediction error is coded by applying block transforms basedon an integer approximation of the discrete cosine transform, which isfollowed by quantization and coding of the transform coefficients. WhileH.264/AVC divides a picture into fixed size macroblocks of 16×16samples, HEVC divides a picture into largest coding units (LCUs), of16×16, 32×32 or 64×64 samples. The LCUs may be further divided intosmaller blocks, i.e., coding units (CU), using a quad-tree structure. ACU may be split further into prediction units (PUs) and transform units(TUs). The size of the transforms used in prediction error coding canvary from 4×4 to 32×32 samples, thus allowing larger transforms than inH.264/AVC, which uses 4×4 and 8×8 transforms. As the optimal size of theabove mentioned blocks typically depends on the picture content, thereconstructed picture is composed of blocks of various sizes, each blockbeing coded using an individual prediction mode and the prediction errortransform.

In a coding scheme that uses block-based prediction, transform coding,and quantization, some characteristics of the compressed video data maydiffer from the original video data. For example, discontinuitiesreferred to as blocking artifacts can occur in the reconstructed signalat block boundaries. Further, the intensity of the compressed video datamay be shifted. Such intensity shift may also cause visual impairmentsor artifacts. To help reduce such artifacts in decompressed video, theemerging HEVC standard defines three in-loop filters: a deblockingfilter to reduce blocking artifacts, a sample adaptive offset filter(SAO) to reduce distortion caused by intensity shift, and an adaptiveloop filter (ALF) to minimize the mean squared error (MSE) betweenreconstructed video and original video. These filters may be appliedsequentially, and, depending on the configuration, the SAO and ALF loopfilters may be applied to the output of the deblocking filter.

SUMMARY

Embodiments of the present invention relate to methods, apparatus, andcomputer readable media for region based in-loop filtering in videocoding. In one aspect, a method for in-loop filtering in a video encoderis provided that includes determining filter parameters for eachfiltering region of a plurality of filtering regions of a reconstructedpicture, applying in-loop filtering to each filtering region accordingto the filter parameters determined for the filtering region, andsignaling the filter parameters for each filtering region in an encodedvideo bit stream, wherein the filter parameters for each filteringregion are signaled after encoded data of a final largest coding unit(LCU) in the filtering region, wherein the in-loop filtering is selectedfrom a group consisting of adaptive loop filtering and sample adaptiveoffset filtering.

In one aspect, a method for in-loop filtering in a video encoder isprovided that includes partitioning largest coding units (LCUs) of areconstructed picture into N×1 LCU aligned filtering regions, wherein Nis an integer, determining filter parameters for each filtering region,applying in-loop filtering to each filtering region according to thefilter parameters determined for the filtering region, and signaling thefilter parameters for each filtering region in an encoded video bitstream, wherein the in-loop filtering is selected from a groupconsisting of adaptive loop filtering and sample adaptive offsetfiltering.

In one aspect, a method for in-loop filtering of coded video data isprovided that includes receiving reconstructed video data correspondingto the coded video data, and applying in-loop filtering to eachfiltering region of a plurality of filtering regions of thereconstructed video data according to filter parameters determined forthe filtering region, wherein the in-loop filtering is one selected froma group consisting of adaptive loop filtering and sample adaptive offsetfiltering, wherein the plurality of filtering regions are determined bypartitioning largest coding units (LCUs) of the reconstructed video datainto N×1 LCU aligned regions, wherein N is an integer.

BRIEF DESCRIPTION OF THE DRAWINGS

Particular embodiments will now be described, by way of example only,and with reference to the accompanying drawings:

FIG. 1 is an example of quadtree based region partitioning of a picturefor sample adaptive offset (SAO) filtering;

FIG. 2 illustrates band offset (BO) classification in SAO filtering;

FIG. 3A illustrates edge offset (EO) classification patterns in SAOfiltering;

FIG. 3B illustrates edge types by EO category;

FIG. 4 illustrates 4×4 LCU aligned filtering regions;

FIG. 5 is an example of prior art slice based SAO and ALF parametersignaling;

FIG. 6 is an example of LCUs in raster scan order in 4×4 LCU alignedfiltering regions;

FIG. 7 is a block diagram of a digital system;

FIG. 8 is a block diagram of a video encoder;

FIG. 9 is a block diagram of the in-loop filter component of the videoencoder;

FIG. 10 is a block diagram of a video decoder;

FIG. 11 is a block diagram of the in-loop filter component of the videodecoder;

FIG. 12 is a flow diagram of a method for region-based in-loop filteringin an encoder;

FIG. 13 illustrates 16×1 LCU aligned filtering regions;

FIG. 14 is an example of LCUs in raster scan order in 16×1 LCU alignedfiltering regions;

FIG. 15 is an example of region based filter parameter signaling with4×4 LCU aligned filtering regions;

FIG. 16 is an example of region based filter parameter signaling with16×1 LCU aligned filtering regions;

FIG. 17 is a flow diagram of a method for region-based in-loop filteringin a decoder; and

FIG. 18 is a block diagram of an illustrative digital system.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

Specific embodiments of the invention will now be described in detailwith reference to the accompanying figures. Like elements in the variousfigures are denoted by like reference numerals for consistency.

As used herein, the term “picture” may refer to a frame or a field of aframe. A frame is a complete image captured during a known timeinterval. For convenience of description, embodiments of the inventionare described herein in reference to HEVC. One of ordinary skill in theart will understand that embodiments of the invention are not limited toHEVC. In HEVC, a largest coding unit (LCU) is the base unit used forblock-based coding. A picture is divided into non-overlapping LCUs. Thatis, an LCU plays a similar role in coding as the macroblock ofH.264/AVC, but it may be larger, e.g., 32×32, 64×64, etc. An LCU may bepartitioned into coding units (CU). A CU is a block of pixels within anLCU and the CUs within an LCU may be of different sizes. Thepartitioning is a recursive quadtree partitioning. The quadtree is splitaccording to various criteria until a leaf is reached, which is referredto as the coding node or coding unit. The maximum hierarchical depth ofthe quadtree is determined by the size of the smallest CU (SCU)permitted. The coding node is the root node of two trees, a predictiontree and a transform tree. A prediction tree specifies the position andsize of prediction units (PU) for a coding unit. A transform treespecifies the position and size of transform units (TU) for a codingunit. A transform unit may not be larger than a coding unit and the sizeof a transform unit may be 4×4, 8×8, 16×16, and 32×32. The sizes of thetransforms units and prediction units for a CU are determined by thevideo encoder during prediction based on minimization of rate/distortioncosts.

An LCU-aligned region of a picture is a region in which the regionboundaries are also LCU boundaries. It is recognized that the dimensionsof a picture and the dimensions of an LCU may not allow a picture to beevenly divided into LCUs. There may be blocks at the bottom of thepicture or the right side of the picture that are smaller than theactual LCU size, i.e., partial LCUs. These partial LCUs are mostlytreated as if they were full LCUs and are referred to as LCUs.

Various versions of HEVC are described in the following documents, whichare incorporated by reference herein: T. Wiegand, et al., “WD3: WorkingDraft 3 of High-Efficiency Video Coding,” JCTVC-E603, JointCollaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 andISO/IEC JTC1/SC29/WG11, Geneva, CH, Mar. 16-23, 2011 (“WD3”), B. Bross,et al., “WD4: Working Draft 4 of High-Efficiency Video Coding,”JCTVC-F803_d6, Joint Collaborative Team on Video Coding (JCT-VC) ofITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Torino, IT, July 14-22, 2011(“WD4”), B. Bross. et al., “WD5: Working Draft 5 of High-EfficiencyVideo Coding,” JCTVC-G1103_d9, Joint Collaborative Team on Video Coding(JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Geneva, CH, Nov.21-30, 2011 (“WD5”), B. Bross, et al., “High Efficiency Video Coding(HEVC) Text Specification Draft 6,” JCTVC-H1003, Joint CollaborativeTeam on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IECJTC1/SC29/WG1, Geneva, CH, Nov. 21-30, 2011 (“HEVC Draft 6”), B. Bross,et al., “High Efficiency Video Coding (HEVC) Text Specification Draft7,” JCTVC-I1003_d0, Joint Collaborative Team on Video Coding (JCT-VC) ofITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG1, Geneva, CH, Apr. 17-May 7,2012 (“HEVC Draft 7”), and B. Bross, et al., “High Efficiency VideoCoding (HEVC) Text Specification Draft 8,” JCTVC-J1003_d7, JointCollaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 andISO/IEC JTC1/SC29/WG1, Stockholm, SE, Jul. 11-20, 2012 (“HEVC Draft 8”).

As previously mentioned, a sample adaptive offset (SAO) filter and anadaptive loop filter (ALF) are two of the in-loop filters included invarious versions of the emerging HEVC standard. These in-loop filtersare applied both in the encoder and the decoder. SAO may be applied toreconstructed pixels after application of a deblocking filter and priorto adaptive loop filtering.

In general, SAO involves adding an offset directly to a reconstructedpixel to compensate for intensity shift. The value of the offset dependson the local characteristics surrounding the pixel, i.e., edgedirection/shape and/or pixel intensity level. There are two techniquesused for determining offset values: band offset (BO) and edge offset(EO). In previous HEVC specifications, e.g., WD4 and WD5, for purposesof SAO, seven SAO filter types are defined: two types of BO, four typesof EO, and one type for no SAO. These types are described in more detailbelow.

The encoder divides a reconstructed picture into LCU-aligned regionsaccording to a top-down quadtree partitioning and decides which of theSAO filter types is to be used for each region. Each region in apartitioning contains one or more LCUs. More specifically, the encoderdecides the best LCU quadtree partitioning and the SAO filter type andassociated offsets for each region based on a rate distortion techniquethat estimates the coding cost resulting from the use of each SAO filtertype. For each possible region partitioning, the encoder estimates thecoding costs of the SAO parameters, e.g., the SAO filter type and SAOoffsets, resulting from using each of the predefined SAO filter typesfor each region, selects the SAO filter type with the lowest cost forthe region, and estimates an aggregate coding cost for the partitioningfrom the region coding costs. The partitioning with the lowest aggregatecost is selected for the picture. An example of an LCU aligned quadtreepartitioning of a picture into regions for purposes of SAO is shown inFIG. 1.

For BO, the pixels of a region are classified into multiple bands whereeach band contains pixels in the same intensity interval. That is, theintensity range is equally divided into 32 bands from zero to themaximum intensity value (e.g., 255 for 8-bit pixels). Based on theobservation that an offset tends to become zero when the number ofpixels in a band is large, especially for central bands, the 32 bandsare divided into two groups, the central 16 bands and two side bands asshown in FIG. 2. Each pixel in a region is classified according to itsintensity into one of two categories: the side band group or the centralband group. The five most significant bits of a pixel are used as theband index for purposes of classification. An offset is also determinedfor each band of the central group and each band of the side band group.The offset for a band may be computed as an average of the differencesbetween the original pixel values and the reconstructed pixel values ofthe pixels in the region classified into the band.

For EO, pixels in a region are classified based on a one dimensional(1-D) delta calculation. That is, the pixels can be filtered in one offour edge directions (0, 90, 135, and 45) as shown in FIG. 3A. For eachedge direction, a pixel is classified into one of five categories basedon the intensity of the pixel relative to neighboring pixels in the edgedirection. Categories 1-4 each represent specific edge shapes as shownin FIG. 2B while category 0 is indicative that none of these edge shapesapplies. Offsets for each of categories 1-4 are also computed after thepixels are classified.

More specifically, for each edge direction, a category number c for apixel is computed as c=sign(p0−p1)+sign (p0−p2) where p0 is the pixeland p1 and p2 are neighboring pixels as shown in FIG. 3A. The edgeconditions that result in classifying a pixel into a category are shownin Table 1 and are also illustrated in FIG. 3B. After the pixels areclassified, offsets are generated for each of categories 1-4. The offsetfor a category may be computed as an average of the differences betweenthe original pixel values and the reconstructed pixel values of thepixels in the region classified into the category.

TABLE 1 Category Condition 1 p0 < p1 and p0 < p2 2 (p0 < p1 and p0 = p2)or (p0 < p2 and p0 = p1) 3 (p0 > p1 and p0 = p2) or (p0 > p2 and p0 =p1) 4 p0 > p1 and p0 > p2 0 none of above

Once the partitioning of the LCUs into regions and the SAO filter typeand offsets for each region are determined, the encoder applies theselected SAO offsets to the reconstructed picture according to theselected LCU partitioning and selected SAO filter types for each regionin the partitioning. The offsets are applied as follows. If SO type 0 isselected for a region, no offset is applied. If one of SAO filter types1-4 is selected for a region, for each pixel in the region, the categoryof the pixel (see Table 1) is determined as previously described and theoffset for that category is added to the pixel. If the pixel is incategory 0, no offset is added.

If one of the two BO SAO filter types, i.e., SAO filter types 5 and 6,is selected for a region, for each pixel in the region, the band of thepixel is determined as previously described. If the pixel is in one ofthe bands for the SAO filter type, i.e., one of the central bands forSAO filter type 5 or one of the side bands for SAO filter type 6, theoffset for that band is added to the pixel; otherwise, the pixel is notchanged.

Further, for each picture, the encoder signals SAO parameters such asthe LCU region partitioning for SAO, the SAO filter type for each LCUregion, and the offsets for each LCU region in the encoded bit stream.Table 2 shows the SAO filter types (sao_type_idx) and the number of SAOoffsets (NumSaoCategory) that are signaled for each filter type. Notethat as many as sixteen offsets may be signaled for a region. For SAOfilter types 1-4, the four offsets are signaled in category order (seeTable 1). For SAO filter types 5 and 6, the 16 offsets are signaled inband order (lowest to highest).

TABLE 2 sao_type_idx NumSaoCategory Edge type 0 0 Not applied 1 4 1D0-degree edge 2 4 1D 90-degree edge 3 4 1D 135-degree edge 4 4 1D45-degree edge 5 16 Central band 6 16 Side band

In a decoder, the SAO parameters for a slice are decoded, and SAOfiltering is applied according to the parameters. That is, the decoderapplies SAO offsets to the LCUs in the slice according to the signaledregion partitioning for the picture and the signaled SAO filter type andoffsets for each of the regions. The offsets for a given region areapplied in the same way as previously described for the encoder.

In general, ALF selectively applies a 10-tap FIR filter to reconstructedpixels in a picture (after deblocking filtering and SAO filtering). Inprevious versions of the HEVC standard, several filter shapes aredefined and the encoder selects one filter shape for a picture and up to16 sets of coefficients for the filter shape. The selected filter shapeand the sets of coefficients are signaled to the decoder in sliceheaders. Typically, the encoder uses a Wiener filter technique to choosecoefficients that minimize the SSE (sum of square error) between thereconstructed pixels and the original pixels.

Two types of ALF filtering are provided: block based and region based.In block based ALF, a picture is divided into 4×4 blocks of pixels andeach block is classified into one of 16 categories. The category of theblock determines which of the coefficient sets is to be used (out of amaximum of 16 coefficient sets) in applying the selected filter to thepixels in the block. Filtering may also be turned on and off on a CUbasis. The encoder determines whether or not ALF is to be applied toeach CU and signals a map to the decoder that indicates whether or notALF is to be used for each CU. To apply the selected filter to apicture, ALF uses a Laplacian-based local activity to switch between thesets of filter coefficients on a 4×4 block-by-block basis.

In region based ALF, a picture is divided into sixteen LCU alignedfiltering regions, i.e., 4×4 regions of LCUs, as shown in FIG. 4. Eachfiltering region is classified into one of 16 categories, whichdetermines which of the coefficient sets is to be used in applying theselected filter to the pixels in the filtering region. The encoderselects the coefficient set for each filtering region and signals theselection to the encoder.

The dimensions of the filtering regions in terms of LCUs depend on thedimensions of the picture and the dimensions of an LCU. The regiondimensions may be determined as follows:

xWidth=(((PicWidthInLCUs+1)>>2)<<Log 2LCUSize)

yHeight=(((PicHeightInLCUs+1)>>2)<<Log 2LCUSize)

where xWidth is the width of a filtering region and yHeight is theheight of a filtering region when the picture is divided into 4×4regions, and Log 2LCUSize is log 2(LCUSize), e.g., if the LCU size is64, log 2LCUsize will be 6.

The picture-based processing of SAO to estimate the offsets, determinewhether to use EO or BO, and to determine region configurations based ona quadtree can be an issue for low latency video coding applications (I,video conferencing or cloud computing) as such processing introduces aminimum of a one picture delay. More specifically, as shown in theexample of FIG. 5, the SAO parameters for LCUs in a slice of a pictureare encoded in the slice header. Due to the picture based SAOprocessing, these parameters are not known until all the LCUs in apicture have been coded. A delay in LCU processing is also incurred inthe decoder as all data for SAO of LCUs in a slice has to be decoded andstored before processing of the LCU data in the slice data can begin.Moreover, the decoded SAO parameters for the entire slice have to bestored before LCU decoding is started, which may increase the memoryrequirements in a decoder.

The region-based processing of ALF also introduces some delay in theencoder. As shown in the example of FIG. 5, the ALF parameters for LCUsin a slice are signaled in the slice header. These parameters, whichinclude, e.g., filter coefficients and on/off flags, are not known untilthe filtering regions containing those LCUs are processed. Because ALFcoefficients are determined independently for each region, the encodermay process regions in parallel to determine coefficients which reducesthe latency but does not eliminate it. However, the determination of ALFcoefficients for a filtering region cannot be started until all the LCUsin the filtering region have been coded, reconstructed, and deblocked inthe encoder. Consider the simple example of FIG. 6. In this example, apicture with 16 rows of 16 LCUs is divided into the 16 LCU aligned 4×4filtering regions. Assuming raster scan order, before an encoder canbegin determining the ALF coefficients for filtering region R0, at leastLCUs 0-51 have to be processed by the encoder and the LCUs in the regionhave to be reconstructed and deblocked.

Embodiments of the invention provide alternative techniques for ALF andSAO parameter determination and signaling of SAO and ALF parameters thatmay be used to reduce the encoder delay of current techniques. In someembodiments, a region based SAO is provided that enables thedetermination of SAO parameters for the filtering regions to beperformed in parallel. In some such embodiments, rather than signalingthe SAO parameters in a slice header, the SAO parameters for eachfiltering region in a slice may be signaled in the slice data at the endof the region data, i.e., the region SAO parameters may be interleavedwith the region data. In some embodiments, for region-based ALF, ratherthan signaling the ALF parameters in a slice header, the ALF parametersfor each filtering region in a slice may be signaled in the slice dataafter at the end of the region data, i.e., the region ALF parameters maybe interleaved with the region data. In some embodiments, an alternativeregion configuration for region-based determination of ALF parametersand/or SAO parameters is provided that may reduce the delay caused bythe current ALF region configuration in combination with raster scanprocessing of LCUs.

FIG. 7 shows a block diagram of a digital system that includes a sourcedigital system 700 that transmits encoded video sequences to adestination digital system 702 via a communication channel 716. Thesource digital system 700 includes a video capture component 704, avideo encoder component 706, and a transmitter component 708. The videocapture component 704 is configured to provide a video sequence to beencoded by the video encoder component 706. The video capture component704 may be, for example, a video camera, a video archive, or a videofeed from a video content provider. In some embodiments, the videocapture component 704 may generate computer graphics as the videosequence, or a combination of live video, archived video, and/orcomputer-generated video.

The video encoder component 706 receives a video sequence from the videocapture component 704 and encodes it for transmission by the transmittercomponent 708. The video encoder component 706 receives the videosequence from the video capture component 704 as a sequence of pictures,divides the pictures into largest coding units (LCUs), and encodes thevideo data in the LCUs. The video encoder component 706 may beconfigured to perform region based SAO and/or region based ALF filteringduring the encoding process as described herein. An embodiment of thevideo encoder component 706 is described in more detail herein inreference to FIG. 8.

The transmitter component 708 transmits the encoded video data to thedestination digital system 702 via the communication channel 716. Thecommunication channel 716 may be any communication medium, orcombination of communication media suitable for transmission of theencoded video sequence, such as, for example, wired or wirelesscommunication media, a local area network, or a wide area network.

The destination digital system 702 includes a receiver component 710, avideo decoder component 712 and a display component 714. The receivercomponent 710 receives the encoded video data from the source digitalsystem 700 via the communication channel 716 and provides the encodedvideo data to the video decoder component 712 for decoding. The videodecoder component 712 reverses the encoding process performed by thevideo encoder component 706 to reconstruct the LCUs of the videosequence. The video decoder component 712 may be configured to performregion based SAO and/or region based ALF filtering during the decodingprocess as described herein. An embodiment of the video decodercomponent 712 is described in more detail below in reference to FIG. 9.

The reconstructed video sequence is displayed on the display component714. The display component 714 may be any suitable display device suchas, for example, a plasma display, a liquid crystal display (LCD), alight emitting diode (LED) display, etc.

In some embodiments, the source digital system 700 may also include areceiver component and a video decoder component and/or the destinationdigital system 702 may include a transmitter component and a videoencoder component for transmission of video sequences both directionsfor video steaming, video broadcasting, and video telephony. Further,the video encoder component 706 and the video decoder component 712 mayperform encoding and decoding in accordance with one or more videocompression standards. The video encoder component 706 and the videodecoder component 712 may be implemented in any suitable combination ofsoftware, firmware, and hardware, such as, for example, one or moredigital signal processors (DSPs), microprocessors, discrete logic,application specific integrated circuits (ASICs), field-programmablegate arrays (FPGAs), etc.

FIG. 8 shows a block diagram of the LCU processing portion of an examplevideo encoder. A coding control component (not shown) sequences thevarious operations of the LCU processing, i.e., the coding controlcomponent runs the main control loop for video encoding. The codingcontrol component receives a digital video sequence and performs anyprocessing on the input video sequence that is to be done at the picturelevel, such as determining the coding type (I, P, or B) of a picturebased on the high level coding structure, e.g., IPPP, IBBP,hierarchical-B, and dividing a picture into LCUs for further processing.

In addition, for pipelined architectures in which multiple LCUs may beprocessed concurrently in different components of the LCU processing,the coding control component controls the processing of the LCUs byvarious components of the LCU processing in a pipeline fashion. Forexample, in many embedded systems supporting video processing, there maybe one master processor and one or more slave processing modules, e.g.,hardware accelerators. The master processor operates as the codingcontrol component and runs the main control loop for video encoding, andthe slave processing modules are employed to off load certaincompute-intensive tasks of video encoding such as motion estimation,motion compensation, intra prediction mode estimation, transformationand quantization, entropy coding, and loop filtering. The slaveprocessing modules are controlled in a pipeline fashion by the masterprocessor such that the slave processing modules operate on differentLCUs of a picture at any given time. That is, the slave processingmodules are executed in parallel, each processing its respective LCUwhile data movement from one processor to another is serial.

The LCU processing receives LCUs of the input video sequence from thecoding control component and encodes the LCUs under the control of thecoding control component to generate the compressed video stream. TheLCUs in each picture are processed in row order. The CUs in the CUstructure of an LCU may be processed by the LCU processing in adepth-first Z-scan order. The LCUs 800 from the coding control unit areprovided as one input of a motion estimation component 820, as one inputof an intra-prediction component 824, and to a positive input of acombiner 802 (e.g., adder or subtractor or the like). Further, althoughnot specifically shown, the prediction mode of each picture as selectedby the coding control component is provided to a mode selector componentand the entropy encoder 834.

The storage component 818 provides reference data to the motionestimation component 820 and to the motion compensation component 822.The reference data may include one or more previously encoded anddecoded pictures, i.e., reference pictures.

The motion estimation component 820 provides motion data information tothe motion compensation component 822 and the entropy encoder 834. Morespecifically, the motion estimation component 820 performs tests on CUsin an LCU based on multiple inter-prediction modes (e.g., skip mode,merge mode, and normal or direct inter-prediction), PU sizes, and TUsizes using reference picture data from storage 818 to choose the bestCU partitioning, PU/TU partitioning, inter-prediction modes, motionvectors, etc. based on a rate distortion coding cost. To perform thetests, the motion estimation component 820 may divide an LCU into CUsaccording to the maximum hierarchical depth of the quadtree, and divideeach CU into PUs according to the unit sizes of the inter-predictionmodes and into TUs according to the transform unit sizes, and calculatethe coding costs for each PU size, prediction mode, and transform unitsize for each CU.

The motion estimation component 820 provides the motion vector (MV) orvectors and the prediction mode for each PU in the selected CUpartitioning to the motion compensation component 822 and the selectedCU/PU/TU partitioning with corresponding motion vector(s), referencepicture index (indices), and prediction direction(s) (if any) to theentropy encoder 834.

The motion compensation component 822 provides motion compensatedinter-prediction information to the mode decision component 826 thatincludes motion compensated inter-predicted PUs, the selectedinter-prediction modes for the inter-predicted PUs, and corresponding TUsizes for the selected CU partitioning. The coding costs of theinter-predicted CUs are also provided to the mode decision component826.

The intra-prediction component 824 provides intra-prediction informationto the mode decision component 826 and the entropy encoder 834. Morespecifically, the intra-prediction component 824 performsintra-prediction in which tests on CUs in an LCU based on multipleintra-prediction modes, PU sizes, and TU sizes are performed usingreconstructed data from previously encoded neighboring CUs stored in thebuffer 828 to choose the best CU partitioning, PU/TU partitioning, andintra-prediction modes based on a rate distortion coding cost. Toperform the tests, the intra-prediction component 824 may divide an LCUinto CUs according to the maximum hierarchical depth of the quadtree,and divide each CU into PUs according to the unit sizes of theintra-prediction modes and into TUs according to the transform unitsizes, and calculate the coding costs for each PU size, prediction mode,and transform unit size for each PU. The intra-prediction informationprovided to the mode decision component 826 includes the intra-predictedPUs, the selected intra-prediction modes for the PUs, and thecorresponding TU sizes for the selected CU partitioning. The codingcosts of the intra-predicted CUs are also provided to the mode decisioncomponent 826. The intra-prediction information provided to the entropyencoder 834 includes the selected CU/PU/TU partitioning withcorresponding intra-prediction modes.

The mode decision component 826 selects between intra-prediction of a CUand inter-prediction of a CU based on the intra-prediction coding costof the CU from the intra-prediction component 824, the inter-predictioncoding cost of the CU from the inter-prediction component 820, and thepicture prediction mode provided by the mode selector component. Basedon the decision as to whether a CU is to be intra- or inter-coded, theintra-predicted PUs or inter-predicted PUs are selected, accordingly.

The output of the mode decision component 826, i.e., the predicted PUs,is provided to a negative input of the combiner 802 and to a delaycomponent 830. The associated transform unit size is also provided tothe transform component 804. The output of the delay component 830 isprovided to another combiner (i.e., an adder) 838. The combiner 802subtracts each predicted PU from the original PU to provide residual PUsto the transform component 804. Each resulting residual PU is a set ofpixel difference values that quantify differences between pixel valuesof the original PU and the predicted PU. The residual blocks of all thePUs of a CU form a residual CU block for the transform component 804.

The transform component 804 performs block transforms on the residual CUto convert the residual pixel values to transform coefficients andprovides the transform coefficients to a quantize component 806. Morespecifically, the transform component 804 receives the transform unitsizes for the residual CU and applies transforms of the specified sizesto the CU to generate transform coefficients.

The quantize component 806 quantizes the transform coefficients based onquantization parameters (QPs) and quantization matrices provided by thecoding control component and the transform sizes. The quantizedtransform coefficients are taken out of their scan ordering by a scancomponent 808 and arranged sequentially for entropy coding. In essence,the coefficients are scanned backward in highest to lowest frequencyorder until a coefficient with a non-zero value is located. Once thefirst coefficient with a non-zero value is located, that coefficient andall remaining coefficient values following the coefficient in thehighest to lowest frequency scan order are serialized and passed to theentropy encoder 834.

The entropy encoder 834 entropy encodes the relevant data, i.e., syntaxelements, output by the various encoding components and the codingcontrol component to generate the compressed video bit stream that isprovided to a video buffer 836 for transmission or storage. The syntaxelements are encoded according to the syntactical order specified inHEVC. This syntactical order specifies the order in which syntaxelements should occur in a compressed video bit stream. Among the syntaxelements that are encoded are flags indicating the CU/PU/TU partitioningof an LCU, the prediction modes for the CUs, and the ordered quantizedtransform coefficients for the CUs. The entropy encoder 834 also codesrelevant data from the in-loop filtering component 816 such as ALFparameters, e.g., filter type and filter coefficients, and SAOparameters, e.g., filter type and offsets.

The LCU processing includes an embedded decoder. As any compliantdecoder is expected to reconstruct an image from a compressed bitstream, the embedded decoder provides the same utility to the videoencoder. Knowledge of the reconstructed input allows the video encoderto transmit the appropriate residual energy to compose subsequentpictures. To determine the reconstructed input, i.e., reference data,the ordered quantized transform coefficients for a CU provided via thescan component 808 are returned to their original post-transformarrangement by an inverse scan component 810, the output of which isprovided to a dequantize component 812, which outputs a reconstructedversion of the transform result from the transform component 804.

The dequantized transform coefficients are provided to the inversetransform component 814, which outputs estimated residual informationrepresenting a reconstructed version of a residual CU. The inversetransform component 814 receives the transform unit size used togenerate the transform coefficients and applies inverse transform(s) ofthe specified size to the transform coefficients to reconstruct theresidual values.

The reconstructed residual CU is provided to the combiner 838. Thecombiner 838 adds the delayed selected CU to the reconstructed residualCU to generate a reconstructed CU, which becomes part of reconstructedpicture data. The reconstructed picture data is stored in a buffer 828for use by the intra-prediction component 824 and is provided to anin-loop filter component 816.

The in-loop filter component 816 applies various filters to thereconstructed picture data to improve the quality of the referencepicture data used for encoding/decoding of subsequent pictures. FIG. 9shows the in-loop filter component 816 in more detail. The filtersincluded in the in-loop filter component 816 include a deblocking filter904, a sample adaptive offset filter (SAO) 906, and an adaptive loopfilter (ALF) 908. The in-loop filter component 816 may apply the variousfilters, for example, on an LCU-by-LCU basis. The three filters may beapplied sequentially as shown in FIG. 9. That is, the deblocking filter904 may be first applied to the reconstructed data. Then, the SAO 906may be applied to the deblocked reconstructed picture data, and the ALF908 is applied to the SAO filtered reconstructed picture data. Referringagain to FIG. 8, the final filtered reference picture data is providedto storage component 818.

In general, the deblocking filter 904 operates to smooth discontinuitiesat block boundaries, i.e., TU and CU block boundaries, in areconstructed picture. In general, the ALF 908 implements an adaptiveWiener filtering technique to minimize distortion in the reconstructedpicture as compared to the original picture. The ALF 908 may beregion-based. That is, the ALF 908 may select a filter type for apicture from a predefined set of filter types, determine one or moresets of filter coefficients for the selected filter type, divide thepicture into filtering regions, and select a set of filter coefficientsfor each region to be used in applying the filter type to the pixels.The ALF 908 also applies the selected filter to each region using theset of coefficients selected for the region. The selected filter typeand the coefficients sets are sent to the entropy encoder 834 to besignaled to the decoder. Embodiments of the ALF 908 may implementtechniques for region-based processing and/or signaling of ALFparameters as described herein in reference to the method of FIG. 12.

In general, the SAO filter 906 determines the best offset values, i.e.,band offset values or edge offset values, to be added to pixels of areconstructed picture to compensate for intensity shift that may haveoccurred during the block based coding of the picture and applies theoffset values to the reconstructed picture. The SAO filter 906 may beregion-based. That is, the SAO filter 906 may divide the picture intofiltering regions, determine the best offset values for each filteringregion, and apply those offset values to the region. The region offsetvalues along with other parameters that together indicate the type offiltering to be applied, i.e., BO or EO, and any other informationneeded in order to apply the offset values are sent to the entropyencoder 834 to be signaled to the decoder. Embodiments of the SAO filter906 may implement techniques for region-based processing and/orsignaling of SAO parameters as described herein in reference to themethod of FIG. 12.

FIG. 10 shows a block diagram of an example video decoder. The videodecoder operates to reverse the encoding operations, i.e., entropycoding, quantization, transformation, and prediction, performed by thevideo encoder of FIG. 10 to regenerate the pictures of the originalvideo sequence. In view of the above description of a video encoder, oneof ordinary skill in the art will understand the functionality ofcomponents of the video decoder without detailed explanation.

The entropy decoding component 1000 receives an entropy encoded(compressed) video bit stream and reverses the entropy coding to recoverthe encoded PUs and header information such as the prediction modes, theencoded CU and PU structures of the LCUs, ALF parameters such as thefilter types and filter coefficient set(s) and SAO parameters. If thedecoded prediction mode is an inter-prediction mode, the entropy decoder1000 then reconstructs the motion vector(s) as needed and provides themotion vector(s) to the motion compensation component 1010.

The inverse quantization component 1002 de-quantizes the quantizedtransform coefficients of the residual CU. The inverse transformcomponent 1004 transforms the frequency domain data from the inversequantization component 1002 back to the residual CU. That is, theinverse transform component 1004 applies an inverse unit transform,i.e., the inverse of the unit transform used for encoding, to thedequantized residual coefficients to produce the residual CUs.

A residual CU supplies one input of the addition component 1006. Theother input of the addition component 1006 comes from the mode switch1008. When an inter-prediction mode is signaled in the encoded videostream, the mode switch 1008 selects predicted PUs from the motioncompensation component 1010 and when an intra-prediction mode issignaled, the mode switch selects predicted PUs from theintra-prediction component 1014.

The motion compensation component 1010 receives reference data fromstorage 1012 and applies the motion compensation computed by the encoderand transmitted in the encoded video bit stream to the reference data togenerate a predicted PU. That is, the motion compensation component 1010uses the motion vector(s) from the entropy decoder 1000 and thereference data to generate a predicted PU.

The intra-prediction component 1014 receives reconstructed samples frompreviously reconstructed PUs of a current picture from the buffer 1007and performs the intra-prediction computed by the encoder as signaled byan intra-prediction mode transmitted in the encoded video bit streamusing the reconstructed samples as needed to generate a predicted PU.

The addition component 1006 generates a reconstructed CU by adding thepredicted PUs selected by the mode switch 1008 and the residual CU. Theoutput of the addition component 1006, i.e., the reconstructed CUs,supplies the input of the in-loop filter component 1016 and is alsostored in the buffer 1007 for use by the intra-prediction component1014.

The in-loop filter component 1016 applies the same filters to thereconstructed picture data as the encoder, i.e., a deblocking filter, anSAO, and an ALF, in the same order to improve the quality of thereconstructed picture data. The output of the in-loop filter component1016 is the decoded pictures of the video bit stream. Further, theoutput of the in-loop filter component 1016 is stored in storage 1012 tobe used as reference data by the motion compensation component 1010.

FIG. 11 shows the in-loop filter component 1016 in more detail. Thefilters included in the in-loop filter component 1016 include adeblocking filter 1104, a sample adaptive offset filter (SAO) 1106, andan adaptive loop filter (ALF) 1108. The deblocking filter 1104 operatesin the same manner as the deblocking filter of the encoder. The ALF 1108may be region-based and applies the filter type and the region specificcoefficients signaled by the encoder to filtering regions of a picture.Embodiments of the ALF 1108 may implement techniques for region-basedfiltering as described herein in reference to the method of FIG. 17. TheSAO filter 1106 may be region-based and applies the region specificoffsets signaled by the encoder to filtering regions of a picture.Embodiments of the SAO filter 1106 may implement techniques forregion-based filtering as described herein in reference to the method ofFIG. 17.

FIG. 12 is a flow diagram of a method for region based in-loop filteringthat may be performed in a video encoder, e.g., the encoder of FIG. 8.As is explained in more detail below, in some embodiments, the in-loopfiltering may be SAO filtering and in some embodiments, the in-loopfiltering may be ALF. Referring now to FIG. 12, filter parameters forfiltering regions of a reconstructed picture are determined 1200 for thetype of in-loop filtering to be performed, e.g., ALF or SAO filtering.The filter parameters may be determined independently for each of thefiltering regions. More specifically, the filter parameters whose valuesare determined based on reconstructed pixel values (e.g., selection of afilter coefficient set for ALF or offset values in SAO filtering) may bedetermined for each region and are determined based on the pixels ofthat region alone. In some embodiments, after the filter parameters forfiltering regions are determined, a merging scheme may be applied todetermine if filtering regions can combined and a single set of filterparameters sent for the combined regions. The determination of SAOfilter parameters for each of the filtering regions may be similar tothat used for determining the parameters for a single region in quadtreebased SAO. The determination of ALF filter parameters for each of thefiltering regions may be similar to that used for the known versions ofregion based ALF.

The filtering regions may be formed by partitioning the LCUs of thereconstructed picture into M×N LCU aligned regions, where M and N areintegers. In some embodiments, the filtering regions are formed bypartitioning the LCUs of the reconstructed picture into 4×4 LCU alignedregions. Such partitioning is previously described herein and isillustrated in FIGS. 4 and 6.

In some embodiments, the filtering regions are formed by partitioningthe LCUs of the reconstructed picture into 16×1 LCU aligned regions asshown in FIG. 13. The width of each of these filtering regions is thewidth of the picture and the height may be determined as follows:

yHeight=(((PicHeightInLCUs+1)>>4)<<Log 2LCUSize)

where Log 2LCUSize is log 2(LCUSize), e.g., if the LCU size is 64, log2LCUsize will be 6. These horizontal filtering regions may be better touse in low latency applications than the 4×4 LCU aligned regions astheir shape better aligns with raster-scan order LCU processing. Thatis, the deblocked LCUs of filtering region R0 will be available beforeany of the deblocked LCUs of filtering region R1. Thus, thedetermination of filtering parameters for R0 can be started while theencoder is working on the LCUs of the next region. Consider the simpleexample of FIG. 14. In this example, a picture with 16 rows of 16 LCUsis divided into the 16 LCU aligned 16×1 filtering regions. Assumingraster scan order, before the determination of the filter parameters forfiltering region R0 can begin, only the deblocked LCUs 0-15 need to beavailable. Contrast this with the 4×4 LCU aligned regions in the exampleof FIG. 6. The deblocked LCUs of filtering region R0 in FIG. 6 will notbe available until at least LCUs 0-51 are processed by the encoder.

Referring again to FIG. 12, the in-loop filtering is applied 1202 to thefiltering regions according to the filter parameters. More specifically,the in-loop filtering of the type for which the region filter parameterswere determined, e.g., ALF or SAO filtering, is performed on eachfiltering region according to the filter parameters determined for thatfiltering region. Note that each filtering region can be filteredindependently as there is no dependency on other regions. Further, thefiltering can be performed as soon as the filtering parameters aredetermined.

The filter parameters for each region are also signaled 1204 in theencoded video bit stream. In some embodiments, the filter parameters aresignaled in the slice header of the slice in which the LCU data of theregions is encoded. In some embodiments, the filter parameters for eachregion are signaled at the end of the encoded region data. That is, thefilter parameters for a region are signaled following the encoded dataof the last LCU in the region. For example, consider the example 4×4 LCUaligned region configuration of FIG. 6. The filter parameters for eachof these regions would be signaled as shown in the example of FIG. 15.Note that filter parameters for filtering region R0 are signaled afterthe data of LCU 51 as that is the last LCU in filtering region R0. Alsonote that some delay is incurred before these parameters can be signaleddue to the raster scan ordering of LCU data in the encoded bit stream.In another example, consider the example 16×1 LCU aligned regionconfiguration of FIG. 14. The filter parameters for each of theseregions would be signaled as shown in the example of FIG. 16. Due to thealignment of the filtering regions with the raster scan LCU ordering,the delay incurred in signaling the filter parameters for filteringregion R0 in this example will be less than that incurred for filteringregion R0 in FIG. 15.

FIG. 17 is a flow diagram of a method for region based in-loop filteringthat may be performed in a video decoder, e.g., the decoder of FIG. 10.As is explained in more detail below, in some embodiments, the in-loopfiltering may be SAO filtering and in some embodiments, the in-loopfiltering may be ALF. Referring now to FIG. 17, signaled filterparameters for filtering regions of a picture are decoded 1800 from theencoded bit stream. In some embodiments, the filter parameters are forSAO filtering of the filtering regions. In some embodiments, the filterparameters are for ALF filtering of the filtering regions.

In some embodiments, the filter parameters for the filtering regions aredecoded from a slice header prior to decoding the LCU data of thefiltering regions. In some embodiments, the filter parameters for eachof the filtering regions are encoded in the bit stream following thedata of the last LCU in the filtering region as previously described. Insuch embodiments, the filter parameters for a filtering region aredecoded after all the LCU data of the filtering region. Note that thisavoids the delay incurred when the filter parameters are encoded in theslice header.

In-loop filtering, e.g., ALF or SAO filtering, is applied 1702 to thefiltering regions according to the filter parameters for each filteringregion. The configuration of the filtering regions, e.g., 4×4 LCUaligned or 16×1 LCU aligned, may be known to the decoder or may besignaled in the encoded bit stream.

FIG. 18 is a block diagram of an example digital system suitable for useas an embedded system that may be configured to perform region-basedin-loop filtering as described herein during encoding of a video streamand/or region-based in-loop filtering during decoding of an encodedvideo bit stream. This example system-on-a-chip (SoC) is representativeof one of a family of DaVinci™ Digital Media Processors, available fromTexas Instruments, Inc. This SoC is described in more detail in“TMS320DM6467 Digital Media System-on-Chip”, SPRS403G, December 2007 orlater, which is incorporated by reference herein.

The SoC 1800 is a programmable platform designed to meet the processingneeds of applications such as video encode/decode/transcode/transrate,video surveillance, video conferencing, set-top box, medical imaging,media server, gaming, digital signage, etc. The SoC 1800 providessupport for multiple operating systems, multiple user interfaces, andhigh processing performance through the flexibility of a fullyintegrated mixed processor solution. The device combines multipleprocessing cores with shared memory for programmable video and audioprocessing with a highly-integrated peripheral set on common integratedsubstrate.

The dual-core architecture of the SoC 1800 provides benefits of both DSPand Reduced Instruction Set Computer (RISC) technologies, incorporatinga DSP core and an ARM926EJ-S core. The ARM926EJ-S is a 32-bit RISCprocessor core that performs 32-bit or 16-bit instructions and processes32-bit, 16-bit, or 8-bit data. The DSP core is a TMS320C64x+TM core witha very-long-instruction-word (VLIW) architecture. In general, the ARM isresponsible for configuration and control of the SoC 1800, including theDSP Subsystem, the video data conversion engine (VDCE), and a majorityof the peripherals and external memories. The switched central resource(SCR) is an interconnect system that provides low-latency connectivitybetween master peripherals and slave peripherals. The SCR is thedecoding, routing, and arbitration logic that enables the connectionbetween multiple masters and slaves that are connected to it.

The SoC 1800 also includes application-specific hardware logic, on-chipmemory, and additional on-chip peripherals. The peripheral set includes:a configurable video port (Video Port I/F), an Ethernet MAC (EMAC) witha Management Data Input/Output (MDIO) module, a 4-bit transfer/4-bitreceive VLYNQ interface, an inter-integrated circuit (I2C) businterface, multichannel audio serial ports (McASP), general-purposetimers, a watchdog timer, a configurable host port interface (HPI);general-purpose input/output (GPIO) with programmable interrupt/eventgeneration modes, multiplexed with other peripherals, UART interfaceswith modem interface signals, pulse width modulators (PWM), an ATAinterface, a peripheral component interface (PCI), and external memoryinterfaces (EMIFA, DDR2). The video port I/F is a receiver andtransmitter of video data with two input channels and two outputchannels that may be configured for standard definition television(SDTV) video data, high definition television (HDTV) video data, and rawvideo data capture.

As shown in FIG. 18, the SoC 1800 includes two high-definitionvideo/imaging coprocessors (HDVICP) and a video data conversion engine(VDCE) to offload many video and image processing tasks from the DSPcore. The VDCE supports video frame resizing, anti-aliasing, chrominancesignal format conversion, edge padding, color blending, etc. The HDVICPcoprocessors are designed to perform computational operations requiredfor video encoding such as motion estimation, motion compensation,intra-prediction, transformation, and quantization. Further, thedistinct circuitry in the HDVICP coprocessors that may be used forspecific computation operations is designed to operate in a pipelinefashion under the control of the ARM subsystem and/or the DSP subsystem.

As was previously mentioned, the SoC 1800 may be configured to performregion based in-loop filtering during video encoding and/or region basedin-loop filtering during decoding of an encoded video bitstream usingmethods described herein. For example, the coding control of the videoencoder of FIG. 8 may be executed on the DSP subsystem or the ARMsubsystem and at least some of the computational operations of the blockprocessing, including the intra-prediction and inter-prediction of modeselection, transformation, quantization, and entropy encoding may beexecuted on the HDVICP coprocessors. At least some of the computationaloperations of the region based in-loop filtering during encoding of avideo stream may also be executed on the HDVICP coprocessors. Similarly,at least some of the computational operations of the various componentsof the video decoder of FIG. 10, including entropy decoding, inversequantization, inverse transformation, intra-prediction, and motioncompensation may be executed on the HDVICP coprocessors. Further, atleast some of the computational operations of the region based in-loopfiltering during decoding of an encoded video bit stream may also beexecuted on the HDVICP coprocessors.

Other Embodiments

While the invention has been described with respect to a limited numberof embodiments, those skilled in the art, having benefit of thisdisclosure, will appreciate that other embodiments can be devised whichdo not depart from the scope of the invention as disclosed herein.

For example, one of ordinary skill in the art will understandembodiments in which ALF uses only one filter type rather than selectinga filter type from multiple filter types.

In another example, particular SAO filter types, edge directions, pixelcategories, numbers of offset values, etc., drawn from versions of theemerging HEVC standard have been described herein. One of ordinary skillin the art will understand embodiments in which the SAO filter types,edge directions, pixel categories, number of offset values, and/or otherspecific details of SAO filtering differ from the ones described.

In another example, embodiments of the invention are described in whichfilter parameters for filtering regions are signaled after the regionsin the encoded bit stream. This signaling technique may reduce delay inthe encoder at the expense of increasing delay in the decoder. In someembodiments, the filter parameters for each filtering region may besignaled before the regions rather than after. This signaling techniquemay increase delay in the encoder while decreasing delay in the decoder.

Embodiments of the methods, encoders, and decoders described herein maybe implemented in hardware, software, firmware, or any combinationthereof. If completely or partially implemented in software, thesoftware may be executed in one or more processors, such as amicroprocessor, application specific integrated circuit (ASIC), fieldprogrammable gate array (FPGA), or digital signal processor (DSP). Thesoftware instructions may be initially stored in a computer-readablemedium and loaded and executed in the processor. In some cases, thesoftware instructions may also be sold in a computer program product,which includes the computer-readable medium and packaging materials forthe computer-readable medium. In some cases, the software instructionsmay be distributed via removable computer readable media, via atransmission path from computer readable media on another digitalsystem, etc. Examples of computer-readable media include non-writablestorage media such as read-only memory devices, writable storage mediasuch as disks, flash memory, memory, or a combination thereof.

Although method steps may be presented and described herein in asequential fashion, one or more of the steps shown in the figures anddescribed herein may be performed concurrently, may be combined, and/ormay be performed in a different order than the order shown in thefigures and/or described herein. Accordingly, embodiments should not beconsidered limited to the specific ordering of steps shown in thefigures and/or described herein.

It is therefore contemplated that the appended claims will cover anysuch modifications of the embodiments as fall within the true scope ofthe invention.

What is claimed is:
 1. A method comprising: determining filterparameters for each filtering region of a plurality of filtering regionsof a reconstructed picture based on pixels in the respective filteringregion; applying in-loop filtering to each filtering region according tothe filter parameters determined for the respective filtering region;and signaling the filter parameters for each filtering region in anencoded video bit stream, wherein the filter parameters for at least oneof the respective filtering regions are signaled between encoded data ofa final coding unit (CU) in the respective filtering region and encodeddata of a CU that is signaled next in the encoded video bit stream afterthe final CU, wherein the in-loop filtering is selected from a groupconsisting of adaptive loop filtering and sample adaptive offsetfiltering.
 2. The method of claim 1, wherein the plurality of filteringregions are determined by partitioning CUs of the reconstructed pictureinto 4×4 CU aligned regions.
 3. The method of claim 1, wherein theplurality of filtering regions are determined by partitioning CUs of thereconstructed picture into N×1 CU aligned regions, wherein N is aninteger.
 4. The method of claim 3, wherein N=16.